293 research outputs found

    Evolving Lucene search queries for text classification

    Get PDF
    We describe a method for generating accurate, compact, human understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct Lucene search queries. Genetic programs acquire fitness by producing queries that are effective binary classifiers for a particular category when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from classification tasks

    A tool for creating and visualising formal concept trees

    Get PDF
    This paper presents a tool for creating and visualising formal concept trees. The concept tree provides an alternative visualisation to the more commonly known concept lattice. The tool described here is an extension of the In-Close formal concept mining program, where concepts are output in a format that can be visualised in a Web Browser using the Collapsible Tree Layout from the D3.js JavaScript library. Because the visualisation is expandable and collapsible, the tool is able to deal with large trees and the user is able to explore branches with single mouse clicks and by panning and zooming the tree. So-called ‘iceberg trees’ can also be produced, by specifying a minimum support for objects

    A comparison of Lucene search queries evolved as text classifiers

    Get PDF
    In this article, we use a genetic algorithm to evolve seven different types of Lucene search query with the objective of generating accurate and readable text classifiers. We compare the effectiveness of each of the different types of query using three commonly used text datasets. We vary the number of words available for classification and compare results for 4, 8, and 16 words per category. The generated queries can also be viewed as labels for the categories and there is a benefit to a human analyst in being able to read and tune the classifier. The evolved queries also provide an explanation of the classification process. We consider the consistency of the classifiers and compare their performance on categories of different complexities. Finally, various approaches to the analysis of the results are briefly explored

    Document clustering with evolved search queries

    Get PDF
    Search queries define a set of documents located in a collection and can be used to rank the documents by assigning each document a score according to their closeness to the query in the multidimensional space of weighted terms. In this paper, we describe a system whereby an island model genetic algorithm (GA) creates individuals which can generate a set of Apache Lucene search queries for the purpose of text document clustering. A cluster is specified by the documents returned by a single query in the set. Each document that is included in only one of the clusters adds to the fitness of the individual and each document that is included in more than one cluster will reduce the fitness. The method can be refined by using the ranking score of each document in the fitness test. The system has a number of advantages; in particular, the final search queries are easily understood and offer a simple explanation of the clusters, meaning that an extra cluster labelling stage is not required. We describe how the GA can be used to build queries and show results for clustering on various data sets and with different query sizes. Results are also compared with clusters built using the widely used k-means algorithm

    Elimination of pain improves specificity of clinical diagnostic criteria for adult chronic rhinosinusitis

    Get PDF
    Objective Determine whether the elimination of pain improves accuracy of clinical diagnostic criteria for adult chronic rhinosinusitis. Study Design Retrospective cohort study. Methods History, symptoms, nasal endoscopy, and computed tomography (CT) results were analyzed for 1,186 adults referred to an academic otolaryngology clinic with presumptive diagnosis of chronic rhinosinusitis. Clinical diagnosis was rendered using the 1997 Rhinosinusitis Taskforce (RSTF) Guidelines and a modified version eliminating facial pain, ear pain, dental pain, and headache. Results Four hundred seventy-nine subjects (40%) met inclusion criteria. Among subjects positive by RSTF guidelines, 45% lacked objective evidence of sinonasal inflammation by CT, 48% by endoscopy, and 34% by either modality. Applying modified RSTF diagnostic criteria, 39% lacked sinonasal inflammation by CT, 38% by endoscopy, and 24% by either modality. Using either abnormal CT or endoscopy as the reference standard, modified diagnostic criteria yielded a statistically significant increase in specificity from 37.1% to 65.1%, with a nonsignificant decrease in sensitivity from 79.2% to 70.3%. Analysis of comorbidities revealed temporomandibular joint disorder, chronic cervical pain, depression/anxiety, and psychiatric medication use to be negatively associated with objective inflammation on CT or endoscopy. Conclusion Clinical diagnostic criteria overestimate the prevalence of chronic rhinosinusitis. Removing facial pain, ear pain, dental pain, and headache increased specificity without a concordant loss in sensitivity. Given the high prevalence of sinusitis, improved clinical diagnostic criteria may assist primary care providers in more accurately predicting the presence of inflammation, thereby reducing inappropriate antibiotic use or delayed referral for evaluation of primary headache syndromes. Level of Evidence4. Laryngoscope, 127:1011-1016, 201

    Optimal Insulin Delivery

    Get PDF
    Insulin therapy is only effective if it is delivered into the right tissue in the right way. Exogenous insulin is intended for the subcutaneous (SC) tissue, not the muscle or skin. If delivered into the latter, its absorption (pharmacokinetics (PK)) and action (pharmacodynamics (PD)) are unpredictable, which often leads to poor glucose control. Correct insulin therapy begins with matching the insulin to the site used. Typically, four sites are used for insulin injection or infusion: the abdomen lateral to the umbilicus all the way to the flanks, the anterior lateral upper half of the thigh, the deltoid region of the arm, and the upper outer quadrant of the buttocks. Regular insulin and neutral protamine Hagedorn (NPH) are both absorbed more rapidly from the arm and abdominal sites and more slowly from the thigh and buttocks. The newer insulin analogs, both rapid- and slow-acting, do not appear to be influenced by the site used for injection. In order to avoid intramuscular (IM) injections, patients should use the shortest needles currently available (the 4-mm pen needle and the 6-mm syringe needle). Very young children should raise a skin fold and inject into it even when using the 4-mm needle. Giving injections with the 6-mm needle at a 45° angle converts this needle into the equivalent of the 4 mm. Injection sites should be rigorously rotated, with the new injection being approximately 1 cm from previous injections. This measure helps prevent the most common complication of injection therapy, lipohypertrophy (LH). Injecting into LH leads to unstable PK and PD and deregulated glucose control, manifested as unexpected hypoglycemia, glycemic variability, and elevated HbA1c values. Comprehensive insulin deliver recommendations have recently been published

    Towards a social media research methodology: Defining approaches and ethical concerns

    Get PDF
    Social media research and suitable methodologies and ethical approaches for analysing social media data are still emerging. This paper presents a methodology for projects using social media data alongside consideration of ethics within the social media analysis context. Earlier stages of the methodology will be expanded to develop a strategy for examining ethics alongside consideration of the relevant analysis techniques that may be employed. This will provide a comprehensive methodology that will provide a springboard for the clear and ethically sound scrutiny of social media data. We aim to present the challenges of using social media data, while the inclusion of ethical and legal aspects in this paper aim to draw researchers' attention to the peculiarity issues involved with dealing with social media data

    Towards a cloud migration decision support system for Small and Medium enterprises in Tamil Nadu

    Get PDF
    Cloud computing is a promising computing paradigm which has the potential to speed up Information Technology adoption among SMEs in developing economies like India. The user friendly, pay per use cloud computing model offers SMEs access to highly scalable and reliable cloud infrastructure without having to invest on buying and maintaining expensive Information Technology resources. However, moving data and application to a cloud infrastructure is not straightforward and can be very challenging as decision makers need to consider numerous aspects before deciding to adopt cloud infrastructure. A review of the literature reveals that there are frameworks available to support cloud migration. However, there are no frameworks, models or tools available to support the whole cloud migration process. This research aims to fill that gap by proposing a conceptual framework for cloud migration decision support system targeted for SMEs in Tamil Nadu

    Evolving text classification rules with genetic programming

    Get PDF
    We describe a novel method for using genetic programming to create compact classification rules using combinations of N-grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that the rules may have a number of other uses beyond classification and provide a basis for text mining applications
    • …
    corecore